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PREFACE 

The Agriculture and Resources Inventory Surveys Through Aerospace Remote Sensing 
is a multiyear program of research, development, evaluation, and application of 
aerospace remote sensing for agricultural resources, which began in fiscal year 
1980. This program is a cooperative effort of the U.S. Department of 
Agriculture, the National Aeronautics and Space Administration, the National 
Oceanic and Atmospheric Administration (U.S. Department of Commerce), the Agency 
for International Development (U.S. Department of State), and the 
U.S. Department of the Interior. 

The work which is the subject of this document was performed by the Earth 
Resources Applications Division, Space and Life Sciences Directorate, Lyndon B. 
Johnson Space Center, National Aeronautics and Space Admi nistracion and Lockheed 
Engineering and Management Services Company, Inc. The tasks performed by 
Lockheed Engineering and Management Services Company, Inc., were accomplished 
under Contract NAS 9-15800. 
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STSSSSW 

1. BACKGROUND 

The Foreign Commodity Production Forecasting project of the Agriculture and 
Resources Inventory Surveys Through Aerospace Remote Sensing (AgRISTARS) pro- 
gram was responsible for developing and testing procedures for using aerospace 
remote sensing technology to provide more objective, timely, and reliable crop 
production forecasts. One of the components of production estimation is 
segment area estimation. Since large-area acreage estimates for small grains 
depend upon segment-level proportion estimates, it is important that those 
propc sion estimates be as accurate and precise as possible. Prior to the 
AgRISTARS program, several procedures were tested in an attempt to find an 
accurate and efficient method for estimating small -grain proportions. In the 
r esultant method. Procedure 1 (PI), labels were used in the random selection of 
training pixels to start a clustering algorithm. Then, cluster statistics were 
used to produce a maximum likelihood classification of the scene into 2- or 
3-class strata. Finally, stratified proportion estimates were made using a 
second random set of labeled dots. However, this classification component 
provided no better results thon those which could have been produced through 
simple random sampling. Thus, clustering had not been an effective method. 

Consequently, a new clustering algorithm was developed (refs. 1 and 2). 
Previously, clusters- were used to define distributions in the data. The new 
algorithm used clusters to generate strata within which crop proportions could 
be estimated. One advantage of this algorithm was that, as an unsupervised 
routine, a first set of training dots was not needed (as in PI). 

In addition, a proportion estimation technique (ref. 3) which used the clusters 
of this algorithm was developed. This technique involved Bayesian estimation 
of cluster-level proportions based on historical information concerning cluster 
purities. The cluster-level estimates were then weighted by their relative 
cluster sizes and aggregated to produce the segment-level estimate. Use of 
this technique was expected to provide better proportion estimates. The tech- 
nique also Implemented sequential sampling in an attempt to sample the segment 
clusters more effectively and further reduce the expected mean squared error 
(MSE) of the proportion estimation. 


1-1 


Characteristic of this new estimation technique, the Bayesian Sequential 
Allocation/Bayesian Estimator (BSA/BE), was the selection of dots, one at a 
time. The sampling technique was an attempt to minimize the MSE of the propor- 
tion estimate. Before each sampling of a dot, expected effects to MSE estimates 
were made for each cluster; and, on the basis of these estimates, a sample was 
taken from the cluster that was expected to most reduce the MSE. This manner of 
sampling provided an additional feature: the option of sampling with a fixed 
sample size or varying the sample size from segment to segment. Varying the 
sample size could be managed by halting the sampling when a predetermined 
threshold was obtained for the Internal MSE estimate. Varying sample sizes in 
this manner was to provide uniform accuracy across segments by sampling more 
frequently from more "difficult" segments. 

A 10-segment development test of the BSA/BE (ref. 4) showed that there was at 
least a 2-to-l reduction in the MSE from that observed from PI, a reduction in 
proportion estimation error, and improved analyst labeling accuracy. 
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2. APPROACH 

Flow diagrams of the 3SA/BE technique and PI are presented In figures 2-1 and 
2-2, respectively. T*ble 2-1 shows the four steps Involved in stratified areal 
estimation and a comparison of the BSA/BE to PI at each level. The BSA/BE dif- 
fers from PI at three of the four steps; whereas PI makes use of approximately 
proportional allocation of sample dots to Iterative Self-Organizing Clustering 
System ^ISOCLS) clusters and a relative count estimator of cluster-level propor- 
tions, tne RSA/BE technlcie makes use of sequential allocation of sample dots to 
CLASSY risers and a Bayesian estimator of cluster-level proportions. By 
incorporating only step 1 of the BSA/BE Into PI (that is, by substituting CLASSY 
clu^tc-r! -j.- ISOCLS clustering) and proportionally allocating sample dots to 
•Justers based on cluster sizes, a new estimation technique, the Proportional 
A1 ’ccation/Relat: ve Count Estimator (PA/RCE) is defined. By additionally incor- 
porating step 3 of the BSA/BE, the Proportional Allocation/Bayesian Estimator 
(PA/BE) technique is defined. Both of these techniques were Included for test- 
ing in this experiment. A fourth technique, the Random Sampling/Relative Count 
Estimator (RS/RCE), was also included in the experiment. The RS/RCE, which ran- 
domly samples the entire scene without regard to clusters and employs a relative 
count estimator of segment-level proportions, was included since PI had not 
proved to be significantly better than the RS/RCE. The PA/RCE was included to 
determine the effectiveness of CLASSJ clustering. The PA/BE was included to 
determine the effect of the cluster-level Bayesian estimator with proportional 
allocation. 

For each of these four techniques, the dot sets that were input had labels from 
one of three possible sources: the integrated labeling procedure (ref. 5), the 
reformatted labeling procedure (ref. 6), or ground-truth data. Combining the 
four techniques with the three sources of dot labels and the two sample size 
requirements (fixed or variable), 24 estimates were made for each segment. The 
effect of these three factors on the estimates was to be determined. 
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TABLE 2-1.- PROCEDURE 1 COMPARED TO THE BAYESIAN SEQUENTIAL 
ALLOCATION/BAYESIAN ESTIMATOR (BSA/BE) TECHNIQUE 
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Figure 2-1.- Segment analysis using Bayes Sequential Allocation procedure 
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Examination of the effects of the different techniques will. In essence, measure 
(a) the effect of using stratified random sampling of CLASSY clusters, which are 
proportional to cluster size, In estimating spring small-grain proportions 
rsther than randomly sampling the entire scene; (b) the effect of Bayesian 
procedures rather than relative frequency In estimating proportions at the 
cluster level proportions; and (c) the effect of Bayesian Sequential Allocation 
rather than proportional allocation In estimating spring small-grain proportions 
(ref. 7). 


3. METHOD 


i he dot sets from which samples were taken contained dots on one of the four 
major grids or alternates for grid dots. Enough dots were labeled from each 
segment so that 75 dots were allocated proportionally to the clusters; this was 
usually the 209 dots from the first grid plus a few (1 to 10) from grid 2. 

This was to insure that each cluster would have enough dots for sequential 
allocations. If it was determined that a grid dot was a boundary dot, an 
alternate dot was substituted for labeling purposes since boundary dots present 
special labeling problems; pure dots have been found to have higher labeling 
accuracies than do boundary dots, but to ignore them by using only pure grid 
dots in proportion estimation could bias results (refs. 8 and 9). From these 
dot sets, sample dots were taken for proportion estimation. 

Two separate estimation processings were made for 35 spring wheat segments: 
for one, a fixed sample size of 50 dots was used; and for the other, varying 
sampling sizes from segment to segment were allowed. 

To permit variable sample sizes, two dots were automatically allocated to each 

cluster so that MSE estimates could be obtained. Then, a threshold was set on 

- 2 

the internal segment MSE estimate (MSE = E(p - p) < .0020). When this thres- 
hold was reached, sampling was halted. To achieve comparable results using 
other techniques, this same sample size was applied to them to obtain propor- 
tion estimates. Thus, while the sample size could vary from segment to seg- 
ment, it was constant among the techniques by which estimates were made for any 
particular segment. 
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4. RESULTS 


Because there were Insufficient data (only nine segments were processible using 
the reformatted procedure) on which to base an evaluation when the reformatted 
labeling procedure was used, the part of the evaluation which would include 
that procedure will not be considered. In appendix A, however, the results are 
presented for the four estimation techniques for which labels were obtained 
from the reformatted procedure. Only those results which were obtained when 
the Integrated procedure labels or ground-truth labels were input were 
considered in the evaluation. 

Although estimates were made with fixed and variable sample sizes, emphasis 
during the evaluation was placed on the fixed sample case. Results of the 
variable sample case were comparable to those of the fixed sample case; these 
results, which include biases, MSE's, and plots of proportion estimation 
errors, are presented in appendix B. Further discussion of the analysis and 
results will concern only the fixed sample case for input dot sets with labels 
from the integrated procedure or ground-truth data. 

Tables 4-1 and 4-2 present biases of proportion estimates, standard deviations 
of estimate errors, and MSE's for all 35 segments when dot labels from the 
integrated procedure were input. The errors are shown in figure 4-1 
(ground-truth proportions for these segments are presented in appendix C). 

On the basis of analyst-interpreter (AI) labels, the PA/RCE technique provided 
a significantly less biased estimate and produced less variable errors than did 
random sampling. The fact that the errors were less variable showed that the 
clustering algorithm had been effective. 

When ground-truth labels were input, the errors produced using the PA/RCE were 
less variable than those of random sampling (table 4-1 and figure 4-2); but, 
the disturbing result was the significant bias produced by random sampling. 

With ground-truth labels Input, random sampling was expected to provide an 
unbiased estimate. Ground-truth labels were input to determine the effect of 
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TABLE 4-1.- ACCURACY AND PRECISION OF THE INTEGRATED 
PROCEDURE WITH AI LABELS AND GROUND-TRUTH LABELS 


Technique 


AI labels 


Bias 

Standard 

deviation 

— 

MSE 

-5.7 

7.7 

90 

-4.0 

6.2 

53 

-3.5 

6.0 

47 


Ground-truth 
label s 


1 

Bias 

Standard 

deviation 

MSE 

-2.5 

6.9 

53 

0.0 

4.0 

16 

0.5 

3.8 

14 

0.4 

4.7 

22 


Random Sampling/ 

Relative Count Estimator 

Proportional/ 

Relative Count Estimator 

Proportional Allocation/ 
Bayesian Estimator 

Bayesian Sequential Allocation/ 
Bayesian Estimator 


-2.7 


6.8 


52 





TABLE 4-2.- RELATIVE ACCURACY AND PRECISION OF THE INTEGRATED ' 
PROCEDURE WITH AT LABELS AND GROUND-TRUTH LABELS 



AI labels 


Ground-truth 
label s 

Technique 

X 

P 

(a) 

Relative 
bias, % 

(b) 

RV 

(c) 

p 

(a) 

Relative 
bias, % 

(b) 

RV 

(c) 

Random Sampling/ 

Relative Count Estimator 

23.4 

-24.4 

32.9 

26.6 

9.4 

25.9 

Proportional/ 

Relative Count Estimator 

25.1 

-15.9 

24.7 

29.1 

0.0 

13.7 

Proportional Allocation/ 
Bayesian Estimator 

25.6 

-13.7 

23.4 

29.6 

1.7 

12.8 

Bayesian Sequential Allocation/ 
Bayesian Estimator 

26.4 

-10.2 

25.8 

29.5 

1.4 

15.9 


a Average proportion estimate = p 
^Relative bias = P -— - P - ;; 100% 


P 

A 

0 

C RV = 100 x — = relative variation 

X 

P 
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techniques with unbiased estimators on the variability of errors and the effect 
of techniques with biased estimators on both the proportion estimates and the 
variability of errors. However, random sampling as an unbiased technique, pro- 
duced a significant underestimate even when ground-truth labels were input. To 
determine the reason for this result, the biases of the 209-plus pixel input 
dot sets were examined since these were the sets from which the 50-dot samples 
were taken. The bias (over all 35 segments) was found to be -0.8 percent, and 
the estimate produced by random sampling was not really significantly biased 
with respect to this. This Indicates that the use of the PA/RCE technique 
resulted In the overestimation of the 209-plus dot proportion estimates by 
0.8 percent. While this was not a significant overestimate, it should be 
noted. The important result achieved was the reduction of error variability 
produced by the PA/RCE from random sampling when AI labels and ground-truth 
labels were input. This reduction was attributed to CLASSY clustering. 

Cluster purities are further discussed in appendix D. 

Since clustering was effective, the next step was to determine the effect of a 
Bayesian estimator. For the PA/3E, the same dots that were used for the PA/RCE 
were again used. Thus, the only difference between the two techniques was the 
estimator employed; with the PA/BE, a cluster-level Bayesian estimator was used 
instead of a relative count estimator. It had been hypothesized that the PA/BE 
would provide improved proportion estimates over the PA/RCE because prior know- 
ledge of cluster purities was being considered. Such results could be expected 
in the same w^y that the PA/RCE was expected to provide proportion estimates 
that were more accurate than those obtained through random sampling because of 
the use of clurtering information. As hypothesized, there seemed to be 
improved precision; but, the difference was small (table 4-1). Figure 4-3 
shows the difference between the PA/BE and the PA/RCE for all 35 segments. 

A positive difference indicates that the PA/BE produced the larger estimate. 

As the PA/RCE estimate increased, there was a tendency for a larger positive 
difference. Whether AI labels or ground-truth labels were input, the PA/BE 
produced a mean proportion estimate that was five-tenths of a percent larger 
than that of the PA/RCE. This was attributed to a tendency for positive 
biasing (with respect to the PA/RCE) by the Bayesian estimator (figure 4-3). 
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Figure 4-3.- Differences in estimates using proportional allocation 
with and without Bayesian estimation. 
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The net effect was a reduction of a negative bias when AI labels were input. 
With the positive biasing, however, the result was a slight reduction (0.2 
percent) in error variability from that of the PA/RCE. This was the case when 
AI labels were Input and also when ground-truth labels were input. In both 
cases, the MSE's of the PA/BE were slightly reduced from those of the PA/RCE. 
These results were encouraging because they supported the expectation that 
Bayesian estimation at the cluster level would provide greater precision 
(although producing slightly biased results) over maximum likelihood 
estimation. 

The final technique was the BSA/BE, the results for which (as can be seen in 
table 4-1) showed It to be the least biased technique when AI labels were 
input. This had been hypothesized since the dots were allocated to clusters 
one at a time with the intention of minimizing the MSE. Although it produced 
the least biased results as hypothesized, the BSA/BE produced more variable 
results than did proportional allocation. This was a disturbing observation. 

In an effort to further .study these results, an attempt was made to separate 
the effects of Bayesian estimation and sequential allocation. In order to 
determine whether or not the results of the BSA/BE followed those of the PA/BE 
when compared to an unbiased estimation technique, estimates were made using 
the same sequentially allocated dots and cluster information with a relative 
count cluster-level (BSA/RCE) estimator rather than the Bayesian estimator. 
Using the Bayesian estimator in the proportion estimation process increased the 
estimates by approximately 2 percent. This was true whether input labels were 
from AI's or ground-truth data (table 4-3). As In proportional allocation, 
Bayesian estimation produced less variable results at the expense of biasing. 
However, with sequential allocation, this bias was not as slight as with pro- 
portional allocation. A graph comparing the two secuentlal estimates for each 
of the 35 segments Is presented In figure 4-4. Notice that there was greater 
overesiimation for segments with lesser amounts of small grain. 
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TABLE 4-3.- ACCURACY AND PRECISION OF SEQUENTIAL ALLOCATION 


Technique 

AI labels 

Ground-truth 1 

labels | 

Bias 

Standard 

deviation 

MSE 

Bias 

Standard 

deviation 

MSE 

Sequential allocation 
(relative count, 
cluster-level estimate) 

-4.9 

7.1 

73 

-1.7 

5.3 

30 

Sequential allocation 
(Bayesian cluster-level 
estimate) 

-2.7 

6.8 

52 

-- 

+0.4 

4.7 

22 
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The fact that the BSA/BE produced more variable results than did the PA/BE was 
due, In part, to a decreased overall labeling accuracy (table 4-4). In order 
to determine whether or not these differences were significant, the differences 
between labeling accuracies of the samples for each segment from those of all 
labeled dots for each segment were found. The means of these differences are 
shown In table 4-5. While there was a significant Improvement of small-grain 
labeling accuracy, there was a simultaneous decrease In nonsmal 1 -grain labeling 
accuracy. The result was a slight decline In total labeling accuracy. 

These results indicate that, with a small sample of 50 dots, proportional alio- 
cation is the sampling method that produces the most precise and reliable esti- 
mates. A slight reduction In variability can be gained at the cost of slight 
biasing of results by i. ing the Bayesian estimation technique. 

Although CLASSY clustering was effective (that is, proportional allocation of 
dots to CLASSY clusters resulted in greater precision for a given sample size), 
the same precision could be obtained by random sampling without the need of 
clusteri-T information if a large enough sample size were taken. If dot sets 
with rti labels were input with the present labeling accuracy, a rancom sample 
of 85 dots would be required to obtain the precision of 50 dots proportionally 
sampled from CLASSY clusters. If labeling was perfect, a random sample of 
166 dots would be required to obtain the same precision of 50 dots 
proportionally allocated to CLASSY clusters. 

Therefore, the biases of proportion estimates, standard deviations of errors, 
and MSE's of all available labeled dots from the 209 pixels were found when dot 
sets with AI labels were Input and when dot sets with ground-truth labels were 
input. Table 4-6 presents the results obtained when those dots were treated as 
a random sample. It was expected that these dots would provide greater preci- 
sion than a 50-dot proportional sampling of CLASSY clusters because of the 
larger sample size. Just as we expected, when us'ing all available labeled 
dots, the RS/RCE showed less variable errors than the PA/RCE when It used only 
50-dot samples allocated to CLASSY clusters. Notice In table 4-6 that the use 
of alternate dots did not introduce a bias; the mean e r> ror was very sma’l when 
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TABLE 4-4.- LABELING ACCURACY 


Technique 

Random 

sampling 

Proportional 

allocation 

Sequential 

allocation 

All 

labeled dots 

Small grains 

72.06 

73.30 

75.10 

72.56 

Nonsmall grains 

93.64 

94.75 

91.62 

93.54 

Total 

88.09 

88.62 

85.40 

87.54 


TABLE 4-5.- MEAN DIFFERENCES OF SAMPLE LABELING 
ACCURACY FROM OVERALL LABELING ACCURACY 



Technique 


Random Proportional Sequential 
sampling allocation allocation 


Small grains 0.93 

Nonsmall grains -0.07 

Total 0.45 



3.14* 

- 2 . 01 * 

-1.18 



♦Indicates a significant difference at the 
10-percent level of significance. 

TABLE 4-6.- ACCURACY AND PRECISION OF A RANDOM 
1 SAMPLE OF AVAILABLE 209 DOTS 



AI labels 

Ground-truth 
1 abel s 

Dots 

Bias 

Standard 

deviation 



Standard 

deviation 

MSE 

Random sample 
(all labeled dots' 

-3.9 

5.8 

48 

-0.8 

2.9 

9 

Proportional sampling 

mu 

6.2 

53 

0.0 

4.0 

16 
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ground-truth labels were used. This was Important since analysts substituted 
alternate dots for boundary dots In both the Integrated and reformatted label- 
ing procedures to provide better labeling targets to eliminate the special 
labeling problems that boundary dots present. 

In order to determine the effect of clustering with larger samples, cluster- 
level proportion estimates were made with a relative count estimator on the 
basis of all labeled dots and weighted by their cluster sizes to produce seg- 
ment-level estimates. These results are shown In table 4-7. As can be seen, 
clustering had little effect on the accuracy or precision of estimates when 
these larger samples were taken. These results point to labeling errors as the 
limiting element In precision. 
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TABLE 4-7.- ACCURACY AND PRECISION OF ALL LABELED 
DOTS WHEN WEIGHTED BY CLUSTER SIZE 


Dots 

AI labels 

Ground truth 
1 abel s 

Bias 

Standard 

deviation 

MSE 

Bias 

Standard 

deviation 

MSE 

All labeled dots (weighted) 


5.7 

48 

( 

o 

• 

*-4 

2.5 

6.3 

All labeled dots (random) 


5.8 

48 

-0.8 

2.9 

9 

Proportl onal samp! i ng 

-4.0 

6.2 

53 

0.0 

4.0 

16 
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5. SUMMARY AND CONCLUSIONS 

For the first time in Foreign Commodity Production Forecasting (FCPF) project 
testing, clustering has been an effective methou in making proportion 
estimates. Proportionally allocating 50 dots to CLASSY clusters to estimate 
proportions resulted in greater precision than using a random sampling of 
50 dots. T' * e was observed when dot sets with AI labels from the integrated 
procedure were input, and it was also observed when dot sets with ground-truth 
labels were input. 

When a cluster-level Bayesian estimator (rather than a relative count estimator) 
was employed with proportional allocation, errors of proportion estimates were 
slightly less variable at the expense of a slight positive bias with respect to 
the estimate of the PA/RCE technique. When dot sets with AI labels from the 
integrated procedure were input, the results of the PA/BE were less biased with 
respect to ground-truth proportions. Whether analyst-labeled dot sets or 
ground-truth labeled dot sets were input, the net result was a reduction in the 
MSE. 

The 8SA/BE provided the least amount of bias with respect to ground-truth pro- 
portions when analyst-labeled dot sets were input. However, this was due to 
positive biasing by the Bayesian estimator with respect to an unbiased estimate 
based on the same dots, also weighted by cluster size. The magnitude of this 
bias was approximately 2 percent. This same effect was observed when dot sets 
with ground-truth labels were input. In addition, the errors of estimates from 
the Sequential Bayesian technique showed greater variability than did those 
from proportional sampling. This was attributed, in part, to a reduced overall 
labeling accuracy observed for dots selected through sequential allocation. 
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It was estimated that in order to obtain the same precision with random sampl- 
ing as obtained by the proportional sampling of 50 dots with an unbiased esti- 
mator, samples of 85 or 166 would need to be taken if dots sets with AI labels 
(integrated procedure) or ground-truth labels, respectively, were input. 

Little difference, on the other hand, was observed between random sampling and 
cluster-weighted estimates when all available labeled dot from the 209 were 
input. Another Important result is that dot relocation by analysts provided 
dot sets that were unbiased. 


6. RECOMMENDATIONS 


While automatic labeling would provide large samples at relatively lx costs* 
it is only a goal. With large samples, these clustering procedures do not seem 
to provide much improvement In proportion estimation. However, it Is not 
recommended that effective clustering algorithms be discarded. Neither should 
efforts In proportion estimation techniques be defaulted to random sampling. 

An effective procedure using clustering Information Is available for use In 
testing and for future development. Automatic labeling, it should be remem- 
bered, Is not yet a reality. It Is therefore recommended that these proportion 
estimation techniques be maintained, particularly the PA/BE because it provided 
the greatest precision. It is recommended also that this estimation procedure 
be considered as the base line for the 1981-82 FCPF Spring Small Grains Pilot 
Experiment. Further exploratory testing needs to be conducted for other crops 
of interest such as corn and soybeans. 
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APPENDIX A 

RESULTS OF THE FOUR ESTIMATION TECHNIQUES UNDER REFORMATTED PROCEDURE 

Because of biowindow restrictions, only nine segments were processible under 
the reformatted procedure. Biases of proportion estimates (for fixed samples) 
along with standard deviations and mean-squared-errors (MSE's) for these seg- ; 

ments are presented in table A-l. The errors of the proportion estimates are 
shown in figures A-l and A-2. When dot sets with labels from the reformatted 
procedure were input, large positive biases were produced through the use of 
all the techniques. Although the estimates produced by techniques using CLASSY j 

clustering were less biased, there was no significant difference among the j 

biases because of the great amount of variation in the errors; as can be seen, 
the standard deviation of the proportion estimate errors in each of the tech- 
niques was approximately 19 percent. Errors in the labeling of dots and the j 

limited number of segments would not permit enough of a basis to warrant an j 

evaluation of the techniques when labels result from the Reformatted procedure. j; 

But to be complete, comparable statistics are provided in table A-l for these ! 

same segments when ground-truth labels were used. Interestingly, the standard ■ 

deviations and MSE's were smaller when CLASSY clustering was used. j 


TABLE A-l.- ACCURACY AND PRECISION OF THE REFORMATTED 
PROCEDURE WITH AI LABELS AND GROUND- TRUTH LABELS 


Technique 

AI labels 

Ground-truth 

labels 

Bias 

Standard 

deviation 

MSE 

Bias 

Standard 

deviation 

MSE 

Random si ng / 

Relati _ Count Estimator 

9.1 

19.4 

436 

-0.8 

6.1 

36 

Proportional Allocation/ 
Relative Count Estimator 

6.2 

19.2 

382 

-1.5 

3.9 

17 

Proportional Allocation/ 
Bayesian Estimator 

6.0 

18.8 

369 

-1.7 

3.9 

17 

Bayesian Sequential Allocation/ 
Bayesian Estimator 

6.3 

19.1 

381 

-2.7 

4.0 

22 
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Proportional Allocation/ Random Sample/ 

Relative Count Estimator Relative Count Estimator 



Figure A-l.- Proportion estimation results with analyst labels 
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RESULTS OF FOUR ESTIMATION TECHNIQUES UNDER VARIABLE SAMPLING OF SEGMENTS 

Proportion estimates for segments with varying sample sizes were made only when 
dot labels were obtained from the Integrated procedure or ground-truth data. 

In table B-l, biases, standard deviations, and MSE's for proportion estimates 
made under sampling based on a threshold (set at .0020) for an Internal MSE 
estimate are presented. 

Proportion errors are shown In figures B-l and B-2. The results were similar 
to those of the fixed sample size. The sample sizes averaged approximately 
42 dots and ranged from 25 to 75 dots. 
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APPENDIX C 


1979 GROUND-TRUTH PROPORTIONS 


Segment 

Ground-truth 

type 

(a) 

Barley, % 

Other spring 
small grains, % 
(b) 

Total spring 
'...lall grains, % 

1387 

D 

8.01 

35.36 

43.37 

1392 

0 

2.02 

28.28 

30.30 

1394 

I 

0.31 

39.51 

39.82 

1457 

I 

3.15 

38.24 

41.39 

1461 

I 

4.99 

48.19 

53.18 

1467 

D 

3.09 

48.46 

51.55 

1472 

I 

4.02 

35.16 

39.18 

1473 

D 

11.69 

39.74 

51.43 

1485 

I 

1.35 

20.80 

22.15 

1514 

D 

4.92 

22.77 

27.69 

1518 

0 

0.29 

25.22 

25.51 

1524 

D 

0.00 

6.96 

6.96 

1571 

I 

0.32 

14.60 

14.92 

1612 

I 

0.00 

16.03 

16.03 

1617 

D 

21.18 

39.68 

60.86 

1619 

D 

10.39 

39.76 

50.15 

1627 

I 

0.00 

15.80 

15.80 

1630 

I 

0.67 

16.80 

17.47 

1636 

I 

0.87 

38.91 

39.87 

1653 

I 

0.00 

16.13 

16.13 

1658 

I 

1.44 

32.41 

33.85 

1664 

D 

1.94 

33.50 

35.44 

1676 

I 

0.23 

7.44 

7.67 

1755 

I 

6.55 

5.64 

12.19 

1784 

I 

4.07 

17.29 

21.36 

1825 

0 

6.20 

19.95 

26.15 

1835 

D 

6.61 

19.02 

24.63 

1843 

D 

0.75 

5.13 

5.88 

1909 

I 

0.88 

17.15 

18.03 

1918 

I 

1.14 

13.80 

14.94 

1920 

I 

0.09 

21.11 

21.20 

1924 

I 

1.01 

36.75 

37.76 

1948 

D 

1.95 

5.57 

7.52 

1974 

I 

4.48 

35.25 

39.73 

•1987 

0 

15.48 

34.40 

49.88 


a D indicates 400 dot ground-truth proportions. 

I indicates inventoried ground-truth proportions from universal 
ground-truth tapes. 

°0ther spring small grains Include spring wheat, oats, durum wheat, 
and flax. 
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CLUSTER PURITIES 


In order to determine the appropriateness of a beta prior for cluster propor- 
tion estimates, small-grain proportions for each cluster were found from 
ground-truth data. The percentage of all clusters having small-grain propor- 
tions within f i ve-hundreth intervals was then found. These clusters are shown 
In figure D-l. The continuous line represents the shape of a beta prior with a 
mean equal to the mean small-grain proportion estimate for those segments 
(0.26). Thus the beta prior is given as follows: 


9(e) 


r(y+ i 
r(a)ru) 


- 8 ) 


e-i 


where a * U.35I3 and g ■ 1. 


As can be seen, the beta seems to be a reasonable prior. 
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